Skip to content

DAOS-18587 chk: handle report upcall failure - b26#17557

Merged
daltonbohning merged 1 commit intorelease/2.6from
Nasf-Fan/DAOS-18587_b26
Mar 2, 2026
Merged

DAOS-18587 chk: handle report upcall failure - b26#17557
daltonbohning merged 1 commit intorelease/2.6from
Nasf-Fan/DAOS-18587_b26

Conversation

@Nasf-Fan
Copy link
Contributor

Anytime when DAOS engine logic needs interaction with admin, it will generate new interaction record in chk_instance::ci_pending_hdl tree, and then trigger dRPP upcall to control plane that may fail for some reason. If hit failure, the dRPC sponsor needs to remove such record from chk_instance::ci_pending_hdl tree before destroying it to avoid triggering fake assertion.

The patch also fixes a container label check issue: If the label is transferred as d_iov_t instead of string, then the buffer maybe not '\0' terminated, need to check its buffer length.

Test-tag: recovery

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

@github-actions
Copy link

Ticket title is 'CR - dmg check start causes engine crash on Aurora'
Status is 'In Progress'
Labels: 'catastrophic_recovery,test_2.8'
https://daosio.atlassian.net/browse/DAOS-18587

Anytime when DAOS engine logic needs interaction with admin, it will
generate new interaction record in chk_instance::ci_pending_hdl tree,
and then trigger dRPP upcall to control plane that may fail for some
reason. If hit failure, the dRPC sponsor needs to remove such record
from chk_instance::ci_pending_hdl tree before destroying it to avoid
triggering fake assertion.

The patch also fixes a container label check issue:
If the label is transferred as d_iov_t instead of string, then the
buffer maybe not '\0' terminated, need to check its buffer length.

Test-tag: recovery

Signed-off-by: Fan Yong <fan.yong@hpe.com>
@Nasf-Fan Nasf-Fan force-pushed the Nasf-Fan/DAOS-18587_b26 branch from d3329c4 to 9a1efff Compare February 13, 2026 15:24
@daosbuild3
Copy link
Collaborator

@Nasf-Fan
Copy link
Contributor Author

Nasf-Fan commented Feb 14, 2026

Passed all required CI tests. NLT failure is not related with the patch.

@Nasf-Fan Nasf-Fan marked this pull request as ready for review February 14, 2026 15:01
@Nasf-Fan Nasf-Fan added release-2.6.5 unclean-cherry-pick Indicates that a cherry-pick had merge conflicts that needed resolving. labels Feb 26, 2026
@Nasf-Fan Nasf-Fan added the forced-landing The PR has known failures or has intentionally reduced testing, but should still be landed. label Mar 1, 2026
@Nasf-Fan
Copy link
Contributor Author

Nasf-Fan commented Mar 1, 2026

Ping reviewers, thanks!

@Nasf-Fan Nasf-Fan requested a review from a team March 2, 2026 03:06
@daltonbohning daltonbohning merged commit b9c88d0 into release/2.6 Mar 2, 2026
41 of 43 checks passed
@daltonbohning daltonbohning deleted the Nasf-Fan/DAOS-18587_b26 branch March 2, 2026 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

forced-landing The PR has known failures or has intentionally reduced testing, but should still be landed. release-2.6.5 unclean-cherry-pick Indicates that a cherry-pick had merge conflicts that needed resolving.

Development

Successfully merging this pull request may close these issues.

5 participants